36 research outputs found
Improvements on automatic speech segmentation at the phonetic level
In this paper, we present some recent improvements in our automatic speech segmentation system, which only needs the speech signal and the phonetic sequence of each sentence of a corpus to be trained. It estimates a GMM by using all the sentences of the training subcorpus, where each Gaussian distribution represents an acoustic class, which probability densities are combined with a set of conditional probabilities in order to estimate the probability densities of the states of each phonetic unit. The initial values of the conditional probabilities are obtained by using a segmentation of each sentence assigning the same number of frames to each phonetic unit. A DTW algorithm fixes the phonetic boundaries using the known phonetic sequence. This DTW is a step inside an iterative process which aims to segment the corpus and re-estimate the conditional probabilities. The results presented here demonstrate that the system has a good capacity to learn how to identify the phonetic boundaries. © 2011 Springer-Verlag.This work was supported by the Spanish MICINN under
contract TIN2008-06856-C05-02Gómez Adrian, JA.; Calvo Lance, M. (2011). Improvements on automatic speech segmentation at the phonetic level. En Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Springer Verlag (Germany). 7042:557-564. https://doi.org/10.1007/978-3-642-25085-9_66S5575647042Toledano, D.T., Hernández Gómez, L., Villarrubia Grande, L.: Automatic Phonetic Segmentation. IEEE Transactions on Speech and Audio Processing 11(6), 617–625 (2003)Kipp, A., Wesenick, M.B., Schiel, F.: Pronunciation modelling applied to automatic segmentation of spontaneous speech. In: Proceedings of Eurospeech, Rhodes, Greece, pp. 2013–2026 (1997)Sethy, A., Narayanan, S.: Refined Speech Segmentation for Concatenative Speech Synthesis. In: Proceedings of ICSLP, Denver, Colorado, USA, pp. 149–152 (2002)Jarify, S., Pastor, D., Rosec, O.: Cooperation between global and local methods for the automatic segmentation of speech synthesis corpora. In: Proceedings of Interspeech, Pittsburgh, Pennsylvania, USA, pp. 1666–1669 (2006)Romsdorfer, H., Pfister, B.: Phonetic Labeling and Segmentation of Mixed-Lingual Prosody Databases. In: Proceedings of Interspeech, Lisbon, Portual, pp. 3281–3284 (2005)Paulo, S., Oliveira, L.C.: DTW-based Phonetic Alignment Using Multiple Acoustic Features. In: Proceedings of Eurospeech, Geneva, Switzerland, pp. 309–312 (2003)Park, S.S., Shin, J.W., Kim, N.S.: Automatic Speech Segmentation with Multiple Statistical Models. In: Proceedings of Interspeech, Pittsburgh, Pennsylvania, USA, pp. 2066–2069 (2006)Mporas, I., Ganchev, T., Fakotakis, N.: Speech segmentation using regression fusion of boundary predictions. Computer Speech and Language 24, 273–288 (2010)Povey, D., Woodland, P.C.: Minimum Phone Error and I-smoothing for improved discriminative training. In: Proceedings of ICASSP, Orlando, Florida, USA, pp. 105–108 (2002)Kuo, J.W., Wang, H.M.: Minimum Boundary Error Training for Automatic Phonetic Segmentation. In: Proceedings of Interspeech, Pittsburgh, Pennsylvania, USA, pp. 1217–1220 (2006)Huggins-Daines, D., Rudnicky, A.I.: A Constrained Baum-Welch Algorithm for Improved Phoneme Segmentation and Efficient Training. In: Proceedings of Interspeech, Pittsburgh, Pennsylvania, USA, pp. 1205–1208 (2006)Ogbureke, K.U., Carson-Berndsen, J.: Improving initial boundary estimation for HMM-based automatic phonetic segmentation. In: Proceedings of Interspeech, Brighton, UK, pp. 884–887 (2009)Gómez, J.A., Castro, M.J.: Automatic Segmentation of Speech at the Phonetic Level. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, pp. 672–680. Springer, Heidelberg (2002)Gómez, J.A., Sanchis, E., Castro-Bleda, M.J.: Automatic Speech Segmentation Based on Acoustical Clustering. In: Hancock, E.R., Wilson, R.C., Windeatt, T., Ulusoy, I., Escolano, F. (eds.) SSPR&SPR 2010. LNCS, vol. 6218, pp. 540–548. Springer, Heidelberg (2010)Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J.B., Nadeu, C.: Albayzin Speech Database: Design of the Phonetic Corpus. In: Proceedings of Eurospeech, Berlin, Germany, vol. 1, pp. 653–656 (September 1993)TIMIT Acoustic-Phonetic Continuous Speech Corpus, National Institute of Standards and Technology Speech Disc 1-1.1, NTIS Order No. PB91-5050651996 (October 1990
Uso de metodologías activas en la implantación de IIP en el grado en informática de la UPV
Este artículo describe la experiencia de implantación
de la asignatura Introducción a la Informática y
la Programación (IIP) de primer curso del Grado
en Informática en la Escuela Técnica Superior de
Ingeniería Informática (ETSINF) de la Universitat
Politècnica de València (UPV), destacando el uso
de metodologías activas de enseñanza-aprendizaje
que incorporan el trabajo en grupo, el diseño de un
método de evaluación acorde con la metodología
empleada y la incorporación de herramientas tecnológicas
de soporte a la docencia. Adicionalmente,
se describen y evidencian los aspectos positivos y
negativos de la experiencia tanto desde el punto de
vista del profesor como del alumno.SUMMARY: This paper describes the experience of setting up
the IIP subject (Introduction to Computer Science
and Programming) to new course degrees of the first
course at School of Computer Science (ETSINF)
in the Universitat Politècnica de València (UPV),
pointing out the usage of active learning methodologies
based on work group, the design of an evaluation
method considering the applied methodology
and the integration of technological tools to support
teaching. Additionally, the positive and negative aspects
of the experience are discussed both from the
point of view of the teacher and the student.Peer Reviewe
ELIRF at MEDIAEVAL 2013: Spoken Web Search Task
In this paper, we present the systems that the Natural Language Engineering and Pattern Recognition group (ELiRF) has submitted to the MediaEval 2013 Spoken Web Search task. All of them are based on a Subsequence Dynamic Time Warping algorithm and are zero-resources systems.Work funded by the Spanish Government and the E.U. under contract TIN2011-28169-C05 and FPU Grant AP2010-
4193.Gómez Adrian, JA.; Hurtado Oliver, LF.; Calvo Lance, M.; Sanchís Arnal, E. (2013). ELIRF at MEDIAEVAL 2013: Spoken Web Search Task. CEUR Workshop Proceedings. 1042:59-60. http://hdl.handle.net/10251/38157S5960104
Fretting : review on the numerical simulation and modelling of wear, fatigue and fracture
This chapter presents a general background and the state of the art of numerical simulation and modeling of fretting phenomenon in terms of wear, fatigue and fracture. First, an introduction of fretting and its implications is exposed. Second, different methodologies for wear modeling and simulation are described and discussed. Afterwards, fatigue and fracture analysis approaches are revised. To that end, multiaxial fatigue parameters are introduced putting an emphasis on the physical basis of the fretting phenomena and the suitability of each model. On the other hand, the propagation phase based on linear elastic fracture mechanics (LEFM) via the finite element method (FEM) and the eXtended finite element method (X-FEM) analysis methods is presented and compared. Finally, different approaches and latest developments for fretting fatigue lifetime prediction are presented and discussed
A phonetic-based approach to query-by-example spoken term detection
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-41822-8_63Query-by-Example Spoken Term Detection (QbE-STD) tasks are usually addressed by representing speech signals as a sequence of feature vectors by means of a parametrization step, and then using a pattern matching technique to find the candidate detections. In this paper, we propose a phoneme-based approach in which the acoustic frames are first converted into vectors representing the a posteriori probabilities for every phoneme. This strategy is specially useful when the language of the task is a priori known. Then, we show how this representation can be used for QbE-STD using both a Segmental Dynamic Time Warping algorithm and a graph-based method. The proposed approach has been evaluated with a QbE-STD task in Spanish, and the results show that it can be an adequate strategy for tackling this kind of problemsWork partially supported by the Spanish Ministerio de Economía y Competitividad under contract TIN2011-28169-C05-01 and FPU Grant AP2010-4193, and by the Vic. d’Investigació of the UPV (PAID-06-10)Hurtado Oliver, LF.; Calvo Lance, M.; Gómez Adrian, JA.; García Granada, F.; Sanchís Arnal, E. (2013). A phonetic-based approach to query-by-example spoken term detection. En Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Springer Verlag (Germany). 8529:504-511. https://doi.org/10.1007/978-3-642-41822-8_63S5045118529Anguera, X., Macrae, R., Oliver, N.: Partial sequence matching using an unbounded dynamic time warping algorithm. In: ICASSP, pp. 3582–3585 (2010)Hazen, T., Shen, W., White, C.: Query-by-example spoken term detection using phonetic posteriorgram templates. In: ASRU, pp. 421–426 (2009)Zhang, Y., Glass, J.: Unsupervised spoken keyword spotting via segmental DTW on gaussian posteriorgrams. In: ASRU, pp. 398–403 (2009)Akbacak, M., Vergyri, D., Stolcke, A.: Open-vocabulary spoken term detection using graphone-based hybrid recognition systems. In: ICASSP, pp. 5240–5243 (2008)Fiscus, J.G., Ajot, J., Garofolo, J.S., Doddingtion, G.: Results of the 2006 spoken term detection evaluation. In: Proceedings of ACM SIGIR Workshop on Searching Spontaneous Conversational, pp. 51–55 (2007)Metze, F., Barnard, E., Davel, M., Van Heerden, C., Anguera, X., Gravier, G., Rajput, N., et al.: The spoken web search task. In: Working Notes Proceedings of the MediaEval 2012 Workshop (2012)Gómez, J.A., Castro, M.J.: Automatic segmentation of speech at the phonetic level. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SSPR & SPR 2002. LNCS, vol. 2396, pp. 672–680. Springer, Heidelberg (2002)Gómez, J.A., Sanchis, E., Castro-Bleda, M.J.: Automatic speech segmentation based on acoustical clustering. In: Hancock, E.R., Wilson, R.C., Windeatt, T., Ulusoy, I., Escolano, F. (eds.) SSPR & SPR 2010. LNCS, vol. 6218, pp. 540–548. Springer, Heidelberg (2010)Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Marino, J., Nadeu, C.: Albayzin speech database: Design of the phonetic corpus. In: Third European Conference on Speech Communication and Technology (1993)Park, A., Glass, J.: Towards unsupervised pattern discovery in speech. In: ASRU, pp. 53–58 (2005)Kullback, S.: Information theory and statistics. Courier Dover Publications (1997)MAVIR corpus, http://www.lllf.uam.es/ESP/CorpusMavir.htm
Uso de metodologías activas en la implantación de IIP en el Grado en Informática de la UPV
Este artículo describe la experiencia de implantación
de la asignatura Introducción a la Informática y
la Programación (IIP) de primer curso del Grado
en Informática en la Escuela Técnica Superior de
Ingeniería Informática (ETSINF) de la Universitat
Politècnica de València (UPV), destacando el uso
de metodologías activas de enseñanza-aprendizaje
que incorporan el trabajo en grupo, el diseño de un
método de evaluación acorde con la metodología
empleada y la incorporación de herramientas tecnológicas
de soporte a la docencia. Adicionalmente,
se describen y evidencian los aspectos positivos y
negativos de la experiencia tanto desde el punto de
vista del profesor como del alumno
Modelos de la teoría de grafos aplicados a problemas de competiciones de programación
[EN] The subject of Algorithms for Problem Solving of the degree of Computer
Science Bachelor’s Degree of the ETSINF is geared towards the solution
of programming challenges that are usually taken from programming competitions,
such as the Southwestern Europe Regional Contest (SWERC),
where students from the ETSINF have been regularly participating for the
last years. The solution of such a problem is obtained by building a suitable
mode for it, finding the optimal solution via this model, and being
able of programming it without bugs in a short period of time. The skill
on the solution of these problems is very much taken into account in recruiting
processes of big technological companies such as Google, Apple,
Yahoo, Microsoft or Facebook.
We show a collaboration between two subjects of this degree: Algorithms
for Problem Solving (CP) and Graphs, models, and applications (GMA).
This collaboration was proposed by students who had taken both subjects
simultaneously. The goals consist on redirect part of the contents of GMA
to the analysis of models that usually appear in this type of problems,
and to facilitate that students can face this challenges. The methodology
consists on raising several problems from the point of view of both subjects.
The first impressions concerning the innovation are positive[ES] La asignatura Competicion de Programacion del Grado de Ingeniera Informatica esta orientada a la resolucion de desafos de programacion quese suelen proponer en competiciones como la Southwestern Europe RegionalContest (SWERC), en la que alumnos de la ETSINF llevan participandoasiduamente durante los ultimos a~nos. Para obtener la solucion deun problema de este tipo se necesita hacer una modelizacion adecuada delmismo, as como hallar una solucion optima por medio del modelo y sercapaz de programarla sin errores en un corto espacio de tiempo. La habilidaden la resolucion de dichos problemas se tiene muy en cuenta en losprocesos de seleccion de personal de grandes compa~nas tecnologicas comoGoogle, Apple, Yahoo, Microsoft o Facebook.Mostramos una colaboracion entre dos optativas de este grado: Competicion de Programacion (CP) y Grafos, Modelos y Aplicaciones (GMA).Esta colaboracion fue propuesta por alumnos que cursaban ambas asignaturassimultaneamente. Los objetivos consisten en reorientar parte de loscontenidos de GMA al analisis de modelos que suelen aparecer con frecuenciaen problemas de competiciones de programacion, facilitando aslos estudiantes para estos afrontar estos desafos. La metodologa consisteen plantear varios problemas desde la optica de ambas asignaturas. Lasprimeras valoraciones de la innovacion son positivas.Proyecto financiado por la Universitat Polit`ecnica de Val`encia. PIME-B08Jordan Lluch, C.; Gómez Adrian, JA.; Calvo Lance, M.; Conejero Casares, JA. (2016). Modelos de la teoría de grafos aplicados a problemas de competiciones de programación. En In-Red 2016. II Congreso nacional de innovación educativa y docencia en red. Editorial Universitat Politècnica de València. https://doi.org/10.4995/INRED2016.2016.4327OC
An algorithm for automatic speech understanding over word graphs
[ES] En este trabajo se propone un algoritmo para la comprensión automática del habla que toma como entrada un grafo de palabras. Este grafo es procesado
en primer lugar mediante un algoritmo de programación dinámica, obteniendo como
resultado un segundo grafo enriquecido con información semántica. El cálculo del
mejor camino sobre este segundo grafo permite obtener la secuencia de conceptos
más verosímil de acuerdo con la evidencia acústica re¿ejada en el grafo de palabras.
También como resultado de la decodi¿cación semántica se obtiene la secuencia de
palabras asociada a dicha secuencia de conceptos, así como la segmentación semántica de la secuencia de palabras.[EN] : In this work we propose an algorithm for automatic speech understanding
that takes a word graph as its input. First, this word graph is processed by means
of a dynamic programming algorithm which gives as a result a second graph that
includes semantic information. Computing the best path over this second graph
allows us to obtain the most likely concept sequence, given the acoustic evidence
re¿ected on the input word graph. As a result of the semantic decoding, the word
sequence attached to the concept sequence as well as its semantic segmentation are
also obtained.Calvo Lance, M.; Gómez Adrian, JA.; Sanchís Arnal, E.; Hurtado Oliver, LF. (2012). Un algoritmo para la comprensión automática del habla sobre grafos de palabras. PROCESAMIENTO DEL LENGUAJE NATURAL. (48):105-112. http://hdl.handle.net/10251/28874S1051124
Multimodal dialog system based on statistical models
En este trabajo presentamos un sistema de diálogo multimodal. Además de la multimodalidad de entrada y salida, la principal característica del sistema es que los módulos más importantes están basados en modelos estadísticos.In this paper, we present a multimodal dialog system. In addition to input and output multimodality, the main feature of the system is that its key modules are based on statistical models.Trabajo parcialmente subvencionado por el gobierno español con el proyecto TIN2008-06856-C05-02 y la Universitat Politècnica de València con el proyecto 20100982
ConvDTW-ACS: Audio Segmentation for Track Type Detection During Car Manufacturing
This paper proposes a method for Acoustic Constrained Segmentation (ACS) in
audio recordings of vehicles driven through a production test track, delimiting
the boundaries of surface types in the track. ACS is a variant of classical
acoustic segmentation where the sequence of labels is known, contiguous and
invariable, which is especially useful in this work as the test track has a
standard configuration of surface types. The proposed ConvDTW-ACS method
utilizes a Convolutional Neural Network for classifying overlapping image
chunks extracted from the full audio spectrogram. Then, our custom Dynamic Time
Warping algorithm aligns the sequence of predicted probabilities to the
sequence of surface types in the track, from which timestamps of the surface
type boundaries can be extracted. The method was evaluated on a real-world
dataset collected from the Ford Manufacturing Plant in Valencia (Spain),
achieving a mean error of 166 milliseconds when delimiting, within the audio,
the boundaries of the surfaces in the track. The results demonstrate the
effectiveness of the proposed method in accurately segmenting different surface
types, which could enable the development of more specialized AI systems to
improve the quality inspection process.Comment: 12 pages, 2 figure